359 research outputs found
Fine-tuning Language Models for Factuality
The fluency and creativity of large pre-trained language models (LLMs) have
led to their widespread use, sometimes even as a replacement for traditional
search engines. Yet language models are prone to making convincing but
factually inaccurate claims, often referred to as 'hallucinations.' These
errors can inadvertently spread misinformation or harmfully perpetuate
misconceptions. Further, manual fact-checking of model responses is a
time-consuming process, making human factuality labels expensive to acquire. In
this work, we fine-tune language models to be more factual, without human
labeling and targeting more open-ended generation settings than past work. We
leverage two key recent innovations in NLP to do so. First, several recent
works have proposed methods for judging the factuality of open-ended text by
measuring consistency with an external knowledge base or simply a large model's
confidence scores. Second, the direct preference optimization algorithm enables
straightforward fine-tuning of language models on objectives other than
supervised imitation, using a preference ranking over possible model responses.
We show that learning from automatically generated factuality preference
rankings, generated either through existing retrieval systems or our novel
retrieval-free approach, significantly improves the factuality (percent of
generated claims that are correct) of Llama-2 on held-out topics compared with
RLHF or decoding strategies targeted at factuality. At 7B scale, compared to
Llama-2-chat, we observe 58% and 40% reduction in factual error rate when
generating biographies and answering medical questions, respectively
Search and Rescue under the Forest Canopy using Multiple UAVs
We present a multi-robot system for GPS-denied search and rescue under the
forest canopy. Forests are particularly challenging environments for
collaborative exploration and mapping, in large part due to the existence of
severe perceptual aliasing which hinders reliable loop closure detection for
mutual localization and map fusion. Our proposed system features unmanned
aerial vehicles (UAVs) that perform onboard sensing, estimation, and planning.
When communication is available, each UAV transmits compressed tree-based
submaps to a central ground station for collaborative simultaneous localization
and mapping (CSLAM). To overcome high measurement noise and perceptual
aliasing, we use the local configuration of a group of trees as a distinctive
feature for robust loop closure detection. Furthermore, we propose a novel
procedure based on cycle consistent multiway matching to recover from incorrect
pairwise data associations. The returned global data association is guaranteed
to be cycle consistent, and is shown to improve both precision and recall
compared to the input pairwise associations. The proposed multi-UAV system is
validated both in simulation and during real-world collaborative exploration
missions at NASA Langley Research Center.Comment: IJRR revisio
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
A trustworthy real-world prediction system should be well-calibrated; that
is, its confidence in an answer is indicative of the likelihood that the answer
is correct, enabling deferral to a more expensive expert in cases of
low-confidence predictions. While recent studies have shown that unsupervised
pre-training produces large language models (LMs) that are remarkably
well-calibrated, the most widely-used LMs in practice are fine-tuned with
reinforcement learning with human feedback (RLHF-LMs) after the initial
unsupervised pre-training stage, and results are mixed as to whether these
models preserve the well-calibratedness of their ancestors. In this paper, we
conduct a broad evaluation of computationally feasible methods for extracting
confidence scores from LLMs fine-tuned with RLHF. We find that with the right
prompting strategy, RLHF-LMs verbalize probabilities that are much better
calibrated than the model's conditional probabilities, enabling fairly
well-calibrated predictions. Through a combination of prompting strategy and
temperature scaling, we find that we can reduce the expected calibration error
of RLHF-LMs by over 50%
Dickkopf-related protein 1 (Dkk1) regulates the accumulation and function of myeloid derived suppressor cells in cancer
Tumor–stroma interactions contribute to tumorigenesis. Tumor cells can educate the stroma at primary and distant sites to facilitate the recruitment of heterogeneous populations of immature myeloid cells, known as myeloid-derived suppressor cells (MDSCs). MDSCs suppress T cell responses and promote tumor proliferation. One outstanding question is how the local and distant stroma modulate MDSCs during tumor progression. Down-regulation of β-catenin is critical for MDSC accumulation and immune suppressive functions in mice and humans. Here, we demonstrate that stroma-derived Dickkopf-1 (Dkk1) targets β-catenin in MDSCs, thus exerting immune suppressive effects during tumor progression. Mice bearing extraskeletal tumors show significantly elevated levels of Dkk1 in bone microenvironment relative to tumor site. Strikingly, Dkk1 neutralization decreases tumor growth and MDSC numbers by rescuing β-catenin in these cells and restores T cell recruitment at the tumor site. Recombinant Dkk1 suppresses β-catenin target genes in MDSCs from mice and humans and anti-Dkk1 loses its antitumor effects in mice lacking β-catenin in myeloid cells or after depletion of MDSCs, demonstrating that Dkk1 directly targets MDSCs. Furthermore, we find a correlation between CD15(+) myeloid cells and Dkk1 in pancreatic cancer patients. We establish a novel immunomodulatory role for Dkk1 in regulating tumor-induced immune suppression via targeting β-catenin in MDSCs
Prediction of cardiovascular outcomes with machine learning techniques: application to the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study.
Background: Data derived from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study were analyzed in an effort to employ machine learning methods to predict the composite endpoint described in the original study.
Methods: We identified 573 CORAL subjects with complete baseline data and the presence or absence of a composite endpoint for the study. These data were subjected to several models including a generalized linear (logistic-linear) model, support vector machine, decision tree, feed-forward neural network, and random forest, in an effort to attempt to predict the composite endpoint. The subjects were arbitrarily divided into training and testing subsets according to an 80%:20% distribution with various seeds. Prediction models were optimized within the CARET package of R.
Results: The best performance of the different machine learning techniques was that of the random forest method which yielded a receiver operator curve (ROC) area of 68.1%±4.2% (mean ± SD) on the testing subset with ten different seed values used to separate training and testing subsets. The four most important variables in the random forest method were SBP, serum creatinine, glycosylated hemoglobin, and DBP. Each of these variables was also important in at least some of the other methods. The treatment assignment group was not consistently an important determinant in any of the models.
Conclusion: Prediction of a composite cardiovascular outcome was difficult in the CORAL population, even when employing machine learning methods. Assignment to either the stenting or best medical therapy group did not serve as an important predictor of composite outcome.
Clinical Trial Registration: ClinicalTrials.gov, NCT00081731
- …